Abstract
Human language and other animal communication systems tend to be optimized for efficiency—the benefits that they bestow relative to the costs of learning and producing them. One of the clearest manifestations of communicative efficiency is Menzerath’s law, which predicts that longer sequences (e.g., songs) will be comprised of shorter elements (e.g., notes). In this study, I assessed the evidence for Menzerath’s law in cetaceans by analyzing vocal sequences from 16 baleen and toothed whale species and comparing them to spoken data from 51 human languages. The vocalizations of 11 of the 16 whale species included in this analysis adhere to Menzerath’s law to an extent that is comparable to, and sometimes far greater than, what is observed in spoken human language data. Humpback whales exhibit Menzerath’s law both at the level of notes within phrases and phrases within songs. There is also a broad tendency for vocal shortening—elements or intervals getting shorter over the course of sequences—which may point to simple energetic constraints. Overall, the results of this study suggest that the vocalizations of a wide range of whale species have undergone compression for increased communicative efficiency.
© Popular Mechanics
Human language and other forms of animal communication exhibit striking parallels in their structure, such as hierarchical organization (24) and adherence to linguistic laws (25) in some songbirds. These features are thought to reflect common constraints that shape communication systems. Complexity, for example, is thought to boost the informativeness of signals by allowing them to take more forms (26), but more complex signals are generally harder to learn and produce (27). Communication systems tend to balance this trade-off by maximizing their efficiency—the ratio of the lifetime benefits that they provide to the costs of learning and producing them (28).
One of the clearest manifestations of communicative efficiency is Menzerath’s law, which predicts that longer sequences (e.g., songs, words) will be comprised of shorter elements (e.g., notes, phonemes) (29). The logic here is simple: when production costs are increased in one domain (e.g., sequence length) they should be decreased in another (e.g., element duration). This negative correlation between sequence length and item length is found at various levels of analysis in language (e.g. phonemes in words, clauses in sentences) (30–33) as well as in music (34). Mathematical modeling work demonstrates that Menzerath’s law is a result of information compression (35–37).
In non-human communication, Menzerath’s law appears to be present in chimpanzee gesture (38) and in the vocalizations of some primates and birds (25,39–43). In cetaceans, though, communicative efficiency is relatively understudied. To my knowledge, Menzerath’s law has only been assessed in the whistle sequences of bottlenose dolphins, where it is present, (44) and in the songs of pygmy blue whales, where it is absent (45). The aim of this study was to assess the evidence for Menzerath’s law in cetaceans by analyzing vocal sequences from 16 baleen and toothed whale species and comparing them to spoken data from 51 human languages.
I focus on Menzerath’s law, rather than other commonly-studied features like Zipf’s laws (46), because its predictions are agnostic about the categories of elements in sequences, thus expanding the number of species that it can be explored in. There are two reasons for this: (1) some species lack detailed classification schemes for their vocal behavior simply because they are understudied, and (2) some species produce rhythmic sequences comprised of one type of click or broadband sound. In these species (e.g., sperm whales, fin whales) the individual element durations are relatively uniform and information is thought to be encoded in the inter-element intervals (3,14,21). A study of gelada baboon vocalizations assessed Menzerath’s law using both elements and intervals and found that its strength was similar in both cases (35). In this study, I adopt Menzerath’s broader view of his law—“the greater the whole the smaller its parts” (29,47)—and fit it to the durations of either elements or intervals depending on which data are reported.
Cetacean vocal sequences have different names in different species (e.g., songs, codas, burst-pulses), and there is significant variation in research effort across taxa, so I used a mixture of different strategies to compile a convenience sample of candidate datasets. For heavily studied species I was able to find papers by using species-specific search term combinations like {“humpback whale” AND “song sequences”} and {“sperm whale” AND “codas”} on Google Scholar. For less represented taxa, like dolphins and porpoises, I also searched for datasets directly on repositories like Dryad, Zenodo, and Figshare. Within odontocetes (i.e., toothed whales), who produce clicks for echolocation, I only included vocalizations that have a known or hypothesized communication function (e.g., sperm whale codas, dolphin burst-pulses) (14,21).
In total, I found 43 studies that reported the durations of elements, or the intervals between elements, within vocal sequences. 13 of these had open data that were suitable for analysis. I emailed the corresponding authors of the remaining studies and was granted access to 10 closed datasets that were suitable for analysis. The final 23 datasets can be seen in Table 1 (1–23). Three of the datasets, two in humpback whales (6,7) and one in killer whales (16), were analyzed separately because they log the durations of higher-level units (e.g., for the humpbacks, phrases within songs rather than notes within phrases).
The phrase-level humpback whale dataset (8) was the only one that did not include the durations of individual elements or intervals in sequences. Instead, Owen et al. (8) report the sequences as strings of element categories, with a separate file that logs the durations of many different elements from each category. For this dataset, I interpolated the sequences with the median duration of each element category. Supplementary analysis with human language data suggests that interpolation with median durations systematically reduces the strength of Menzerath’s law, which should lead to more conservative conclusions (see Supplementary Information).
The phylogeny in Figures 1 and 2 comes from a metatree of Cetacea comprised of both molecular and morphological data (48). As the phylogeny was primarily for visualization purposes, I assigned three species that do not appear in the metatree to close relatives in the same genus: the narrow-ridged finless porpoise (Neophocaena asiaeorientalis) to the Indo-Pacific finless porpoise (Neophocaena phocaenoides), the Commerson’s dolphin (Cephalorhynchus commersonii) to the Chilean dolphin (Cephalorhynchus eutropia), and the Peale’s dolphin (Lagenorhynchus australis) to the white-beaked dolphin (Lagenorhynchus albirostris).
As a comparison with the whale data, I also analyzed spoken language data from DoReCo (49)—a corpus of ~500,000 annotated words (with phonemes) from 51 languages that focuses on small and endangered languages (49). DoReCo has been used in previous studies of Menzerath’s law (32). The only pre-processing was removing everything marked as an “exceptional speech event” (i.e., singing, disfluencies, silent pauses). For the main analysis, I followed (29) in using the durations of phonemes within words, but the results with the durations of words within sentences can be found in the Supplementary Information.
Group | Species | Source | Open | Vocalization | Type |
|---|---|---|---|---|---|
Baleen Whale | Blue Whale | Lewis et al. (2018) (1) | Yes | Songs | Durations |
Bowhead Whale | Erbs et al. (2021) (2) | No | Songs | Durations | |
Fin Whale | Romagosa et al. (2024) (3) | Yes | Songs | Intervals | |
Wood & Širović (2022) (4) | Yes | Songs | Intervals | ||
Best et al. (2022) (5) | Yes | Songs | Intervals | ||
Humpback Whale | Schall et al. (2021) (6) | Yes | Songs | Durations | |
Schall et al. (2022) (7) | Yes | Songs | Durations | ||
Owen et al. (2019) (8) | Yes | Phrases | Durations | ||
Minke Whale | Martin et al. (2022) (9) | Yes | Call Sequences | Intervals | |
Right Whale | Crance et al. (2019) (10) | No | Songs | Durations | |
Sei Whale | Macklin et al. (2024) (11) | No | Call Sequences | Durations | |
Toothed Whale | Bottlenose Dolphin | Stepanov et al. (2023) (12) | No | Burst Pulses | Intervals |
Commerson’s Dolphin | Martin et al. (2021) (13) | No | Burst Pulses | Intervals | |
Heaviside’s Dolphin | Martin et al. (2018) (14) | No | Burst Pulses | Intervals | |
Hector’s Dolphin | Nielsen et al. (2024) (15) | No | Burst Pulses | Intervals | |
Killer Whale | Selbmann et al. (2023) (16) | Yes | Call Sequences | Durations | |
Sharpe et al. (2017) (17) | No | Calls | Durations | ||
Narrow-Ridged Finless Porpoise | Terada et al. (2022) (18) | No | Burst Pulses | Intervals | |
Peale’s Dolphin | Martin et al. (2024) (19) | No | Burst Pulses | Intervals | |
Risso’s Dolphin | Arranz et al. (2016) (20) | Yes | Burst Pulses | Intervals | |
Sperm Whale | Hersh et al. (2022) (21) | Yes | Codas | Intervals | |
Vachon et al. (2022) (22) | Yes | Codas | Intervals | ||
Gero et al. (2016) (23) | Yes | Codas | Intervals |
In this study, I focus on the Menzerath-Altmann law—a precise and more robust mathematical form of Menzerath’s law (47,50). Here is the standard form of the Menzerath-Altmann law where \(y\) is the duration of elements within a sequence comprised of \(x\) elements, and \(a\), \(b\), and \(c\) are parameters controlling the shape of the relationship.
\[\begin{equation} y = ax^{b}e^{cx} \;\;\textrm{(full model)} \tag{1} \end{equation}\]
\(c\) is usually close to 0 when this model is fit to empirical data (43), leading to a reduced model that is its most common form in contemporary linguistics (30).
\[\begin{equation} y = ax^{b} \;\;\textrm{(reduced model)} \tag{2} \end{equation}\]
With some simple algebra we can convert Equation (1) and Equation (2) into linear models.
\[\begin{equation} \ln(y) = \ln(a) + b\ln(x) + cx \;\;\textrm{(full model)} \tag{3} \end{equation}\]
\[\begin{equation} \ln(y) = \ln(a) + b\ln(x) \;\;\textrm{(reduced model)} \tag{4} \end{equation}\]
I will use Equation (4) to enable direct comparison with previous studies of the Menzerath-Altmann law in non-human animals (12,35,40,41,43,44,51,52), and because the inclusion of \(x\) twice in (3) leads to fairly severe problems with multicollinearity (\(\bar{VIF}\) = 17.2).
\(y\) is usually the mean duration of elements within sequences, but I will follow (25) in using the full distribution of element durations within sequences. This leads to similar estimates of \(a\) and \(b\) in linguistic corpora, helps to avoid spurious “regression to the mean” effects (35,53,54), and better captures uncertainty in the models (25). I also follow other work in excluding single-element sequences (i.e., with a length of one) from the analysis, which have been shown to depart from Menzerath’s law (38,50,55,56).
All models were fit using the lme4 (v1.1-35.1) package in R (v4.3.1) (57) with the BOBYQA optimizer. To enable direct comparison of fixed effects across different models, I used maximum likelihood and z-scored the sequence lengths and element or interval durations within species and languages (58). All reported models were manually checked for convergence.
The main model used to test Menzerath’s law was Equation (4) with sequence ID as a varying intercept to account for the repeated measurements of durations within sequences. Some species had multiple datasets, in which case the study ID was included as a second varying intercept. Here is the main model in Wilkinson notation—standard R model syntax.
\[\begin{equation} \ln(\textrm{duration}) \sim \ln(\textrm{length}) + (1|\textrm{sequence}) \tag{5} \end{equation}\]
Additionally, I fit a second model that included the position of each element or interval in the sequence as a fixed effect, following previous studies of Menzerath’s law in non-human animals (35,40,43,44,51,52). Position was normalized between 0 and 1 using the function \((n - 1)/(l - 1)\), where \(n\) is the position of the element or interval and \(l\) is the length of the sequence (43). The purpose of this model was to assess whether Menzerath’s law is driven by a shortening of elements or intervals over the course of the sequence, or a tendency to begin long sequences with shorter elements or intervals.
\[\begin{equation} \ln(\textrm{duration}) \sim \ln(\textrm{length}) + \textrm{position} + (1|\textrm{sequence}) \tag{6} \end{equation}\]
#fit models to datasets
#singular fit errors occur because the random effects terms are estimated near zero
#not a problem, especially as it seems to occur only with the null models
sperm_models <- menz_fit(sperm_data)
humpback_models <- menz_fit(humpback_data)
humpback_phrase_models <- menz_fit(humpback_phrase_data)
fin_models <- menz_fit(fin_data)
killer_models <- menz_fit(killer_data)
killer_sequence_models <- menz_fit(killer_sequence_data)
blue_models <- menz_fit(blue_data)
minke_models <- menz_fit(minke_data)
bowhead_models <- menz_fit(bowhead_data)
right_models <- menz_fit(right_data)
narrow_models <- menz_fit(narrow_data)
rissos_models <- menz_fit(rissos_data)
bottlenose_models <- menz_fit(bottlenose_data)
heavisides_models <- menz_fit(heavisides_data)
commersons_models <- menz_fit(commersons_data)
peales_models <- menz_fit(peales_data)
hectors_models <- menz_fit(hectors_data)
sei_models <- menz_fit(sei_data)
#load phylogenetic data
cetaceans <- ape::read.tree("data/phylo_lloyd_2021.tre")
#create table of all tips
ceta_tips <- data.frame(scientific = c("Megaptera_novaeangliae", #humpback
"Balaenoptera_musculus", #blue
"Balaenoptera_physalus", #fin
"Balaenoptera_acutorostrata", #minke
"Balaena_mysticetus", #bowhead
"Eubalaena_japonica", #right
"Balaenoptera_borealis", #sei whale
"Physeter_macrocephalus", #sperm
"Orcinus_orca", #killer
#"Phocoena_phocoena", #narrow-ridged finless porpoise (replaced species name with harbor porpoise bc does not exist in phylo)
"Grampus_griseus", #rissos dolphin
"Tursiops_truncatus", #bottlenose dolphin
"Cephalorhynchus_heavisidii", #heavisides dolphin
#"Cephalorhynchus_eutropia", #commersons dolphin (replaced species name with close relative bc does not exist in phylo)
#"Lagenorhynchus_albirostris", #peales dolphin (replaced species name with close relative bc does not exist in phylo)
"Cephalorhynchus_hectori"), #hectors dolphin
common = c("humpback",
"blue",
"fin",
"minke",
"bowhead",
"right",
"sei",
"sperm",
"killer",
#"narrow",
"rissos",
"bottlenose",
"heavisides",
#"commersons",
#"peales",
"hectors"))
#subset the original phylogeny to only include the relevant species
cetaceans <- ape::keep.tip(cetaceans, ceta_tips$scientific)
#overwrite with common name abbreviation for easy matching
cetaceans$tip.label <- ceta_tips$common[match(cetaceans$tip.label, ceta_tips$scientific)]
#format model estimates and standard errors for computing the phylogenetic signal
phylo_sig_data <- data.frame(est = c(summary(sperm_models$actual$reduced_scaled)$coef[2, 1],
summary(humpback_models$actual$reduced_scaled)$coef[2, 1],
summary(fin_models$actual$reduced_scaled)$coef[2, 1],
summary(killer_models$actual$reduced_scaled)$coef[2, 1],
summary(blue_models$actual$reduced_scaled)$coef[2, 1],
summary(minke_models$actual$reduced_scaled)$coef[2, 1],
summary(bowhead_models$actual$reduced_scaled)$coef[2, 1],
summary(right_models$actual$reduced_scaled)$coef[2, 1],
summary(narrow_models$actual$reduced_scaled)$coef[2, 1],
summary(rissos_models$actual$reduced_scaled)$coef[2, 1],
summary(bottlenose_models$actual$reduced_scaled)$coef[2, 1],
summary(heavisides_models$actual$reduced_scaled)$coef[2, 1],
summary(commersons_models$actual$reduced_scaled)$coef[2, 1],
summary(peales_models$actual$reduced_scaled)$coef[2, 1],
summary(hectors_models$actual$reduced_scaled)$coef[2, 1],
summary(sei_models$actual$reduced_scaled)$coef[2, 1]),
err = c(summary(sperm_models$actual$reduced_scaled)$coef[2, 2],
summary(humpback_models$actual$reduced_scaled)$coef[2, 2],
summary(fin_models$actual$reduced_scaled)$coef[2, 2],
summary(killer_models$actual$reduced_scaled)$coef[2, 2],
summary(blue_models$actual$reduced_scaled)$coef[2, 2],
summary(minke_models$actual$reduced_scaled)$coef[2, 2],
summary(bowhead_models$actual$reduced_scaled)$coef[2, 2],
summary(right_models$actual$reduced_scaled)$coef[2, 2],
summary(narrow_models$actual$reduced_scaled)$coef[2, 2],
summary(rissos_models$actual$reduced_scaled)$coef[2, 2],
summary(bottlenose_models$actual$reduced_scaled)$coef[2, 2],
summary(heavisides_models$actual$reduced_scaled)$coef[2, 2],
summary(commersons_models$actual$reduced_scaled)$coef[2, 2],
summary(peales_models$actual$reduced_scaled)$coef[2, 2],
summary(hectors_models$actual$reduced_scaled)$coef[2, 2],
summary(sei_models$actual$reduced_scaled)$coef[2, 2]),
species = c("sperm", "humpback", "fin", "killer", "blue", "minke", "bowhead", "right",
"narrow",
"rissos", "bottlenose", "heavisides",
"commersons", "peales",
"hectors", "sei"))
#compute and save phylogenetic signal with a p-value
phylo_signal <- phytools::phylosig(cetaceans, x = phylo_sig_data$est[match(cetaceans$tip.label, phylo_sig_data$species)], se = phylo_sig_data$err[match(cetaceans$tip.label, phylo_sig_data$species)], method = "K", test = TRUE)
save(phylo_signal, file = "models/phylo_signal.RData")
#store locations of human datasets
human_datasets <- list.files("data/doreco/")
human_datasets <- substr(human_datasets[grep("doreco", human_datasets)], 1, 15)
#get labels for plotting
human_dataset_labels <- as.character(sapply(human_datasets, function(x){gsub(" DoReCo dataset.*", "", gsub("^The ", "", readLines(paste0("data/doreco/", x, "_extended/", x, "_dataset-info.txt")[1])[1]))}))
#store the actual phonemic data files for each language
human_datasets <- paste0("data/doreco/", human_datasets, "_extended/", human_datasets, "_ph.csv")
#compile dataset of phonemes in words and phonemes in sentences
phonemes <- parallel::mclapply(human_datasets, read_phonemes, mc.cores = 7)
#run menzerath models on all of the datasets at the word level
#cannot parallelize because menz_fit includes parallelization
#if inf cumsum error happens you need to increase the multiple in the null model window calculation
phonemes_in_words <- lapply(1:length(phonemes), function(g){menz_fit(data = phonemes[[g]]$words, cores = 1)})
#run menzerath models on all of the datasets at the sentence level
words_in_sentences <- lapply(1:length(phonemes), function(g){menz_fit(data = phonemes[[g]]$sentences, cores = 1)})
#save phonemic data
save(phonemes, file = "data/doreco/phonemes.RData")
save(phonemes_in_words, file = "models/phonemes_in_words.RData")
save(words_in_sentences, file = "models/words_in_sentences.RData")
Finally, I assessed broader cross-species trends in Menzerath’s law with expanded forms of Equation (5) and (6) applied to all species at once. Interactions between length and position and the following two features were added: (1) the group the species comes from, to determine whether the effect varies between Mysticetes and Odontocetes, and (2) the type of vocalization, to determine whether the effect is stronger for elements or intervals. Group and type were not added as separate fixed effects (outside of the interactions) because the z-scaling of duration within species removes species differences. Sequence and study were included as varying intercepts. The effect of sequence length on element/interval duration does not have significant phylogenetic signal (\(K\) = 0.32; \(p\) = 0.46), computed using the method of (59) as implemented in the phytools package (2.1.1) in R (v4.3.1) (60), so I did not include phylogeny in the modeling.
\[\begin{align*} \ln(\textrm{duration}) & \sim \ln(\textrm{length}) \\ & + \ln(\textrm{length}) : \textrm{group} + \ln(\textrm{length}) : \textrm{type} \\ & + (1|\textrm{sequence}) + (1|\textrm{study}) \tag{7} \end{align*}\]
\[\begin{align*} \ln(\textrm{duration}) & \sim \ln(\textrm{length}) \\ & + \ln(\textrm{length}) : \textrm{group} + \ln(\textrm{length}) : \textrm{type} \\ & + \textrm{position} \\ & + \textrm{position} : \textrm{group} + \textrm{position} : \textrm{type} \\ & + (1|\textrm{sequence}) + (1|\textrm{study}) \tag{8} \end{align*}\]
#combine data from all species to be analyzed in a single model
all_species_data <- list(sperm = sperm_data, humpback = humpback_data, fin = fin_data, killer = killer_data, blue = blue_data, minke = minke_data, bowhead = bowhead_data, right = right_data, narrow = narrow_data, rissos = rissos_data, bottlenose = bottlenose_data, heavisides = heavisides_data, commersons = commersons_data, peales = peales_data, hectors = hectors_data, sei = sei_data)
#groups: 0 for mysticetes, 1 for odontocetes; types: 0 for elements, 1 for intervals
groups <- c(1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0)
types <- c(1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0)
summary(all_species_model$base)$optinfo$conv$lme4$messages
#run model
#positive interaction means that 1 has weaker ML than 0
all_species_model <- menz_compare(all_species_data, groups, types)
# #print messages from model to check convergence
# summary(all_species_model$base)$optinfo$conv$lme4$messages
# summary(all_species_model$position)$optinfo$conv$lme4$messages
#return
save(all_species_model, file = "models/all_species_model.RData")
(43) recently found that Menzerath’s law can be detected in pseudorandom sequences of birdsong syllables that are forced to match the durations of real songs. (43) interpret their model as approximating simple motor constraints, while stronger effects in the real data would indicate additional mechanisms (e.g., communicative efficiency through behavioral plasticity). I originally planned to compare the strength of Menzerath’s law in the real data with simulated data from the model of (43), as I recently did for house finch song (25), but analyses of language data suggest that it is far too conservative of a null model. 0 of the 51 of languages in the DoReCo dataset exhibit Menzerath’s law to a greater extent than simulated data. Even though many whale species exhibit Menzerath’s law to a greater extent than simulated data from the null model of (43) (75%; 12 out of 16 species), I do not want to over-interpret this result given the pattern in the human data. Upon further reflection I think that the fundamental assumption of the model of (43), that sequence durations are governed by motor constraints alone, is unlikely to apply to many species with more complex communication systems. In humpback whales and sperm whales, for example, there appears to be significant inter-individual variation in song and coda length depending on social context (21,61). More details about the exploratory analysis using the model of (43) can be found in the Supplementary Information.
In total, this analysis includes 610,182 elements and intervals from 65,492 sequences, 23 studies, and 16 species.
The strength of Menzerath’s law in baleen and toothed whale species can be seen in Figures 1 and 2, respectively. In all baleen whale species, except for the North Pacific right whale, there is a negative relationship between sequence length and element or interval duration consistent with Menzerath’s law. The results are more mixed for the toothed whale species, where only five of the nine exhibit Menzerath’s law. All three dolphins in the Cephalorhynchus genus, as well as killer whales, display a neutral or positive relationship between sequence element and element or interval duration.
#load libraries and data
library(ggtree)
cetaceans <- ape::read.tree("data/phylo_lloyd_2021.tre")
#create table of mysticetes tips
myst_tips <- data.frame(scientific = c("Megaptera_novaeangliae", #humpback
"Balaenoptera_musculus", #blue
"Balaenoptera_physalus", #fin
"Balaenoptera_acutorostrata", #minke
"Balaena_mysticetus", #bowhead
"Eubalaena_japonica", #right
"Balaenoptera_borealis"), #sei whale
common = c("Humpback Whale\n(Megaptera novaeangliae)",
"Blue Whale\n(Balaenoptera musculus)",
"Fin Whale\n(Balaenoptera physalus)",
"Common Minke Whale\n(Balaenoptera acutorostrata)",
"Bowhead Whale\n(Balaena mysticetus)",
"North Pacific Right Whale\n(Eubalaena japonica)",
"Sei Whale\n(Balaenoptera borealis)"),
img = c("humpback", "blue", "fin", "minke", "bowhead", "right", "sei"))
#create table of odontocete tips
odon_tips <- data.frame(scientific = c("Physeter_macrocephalus", #sperm
"Orcinus_orca", #killer
"Neophocaena_phocaenoides", #narrow-ridged finless porpoise (replaced species name with indo-pacific finless porpoise bc does not exist in phylo)
"Grampus_griseus", #rissos dolphin
"Tursiops_truncatus", #bottlenose dolphin
"Cephalorhynchus_heavisidii", #heavisides dolphin
"Cephalorhynchus_eutropia", #commersons dolphin (replaced species name with close relative bc does not exist in phylo)
"Lagenorhynchus_albirostris", #peales dolphin (replaced species name with close relative bc does not exist in phylo)
"Cephalorhynchus_hectori"), #hectors dolphin
common = c("Sperm Whale\n(Physeter macrocephalus)",
"Killer Whale\n(Orcinus orca)",
"Narrow-Ridged Finless Porpoise\n(Phocoena phocoena)",
"Risso's Dolphin\n(Grampus griseus)",
"Bottlenose Dolphin\n(Tursiops truncatus)",
"Heaviside's Dolphin\n(Cephalorhynchus heavisidii)",
"Commerson's Dolphin\n(Cephalorhynchus commersonii)",
"Peale's Dolphin\n(Lagenorhynchus australis)",
"Hector's Dolphin\n(Cephalorhynchus hectori)"),
img = c("sperm", "killer", "narrow", "rissos", "dolphin", "heavisides", "commersons", "peales", "hectors"))
#subset the original phylogeny to only include the relevant species
mysticetes <- ape::keep.tip(cetaceans, myst_tips$scientific)
odonticetes <- ape::keep.tip(cetaceans, odon_tips$scientific)
#match up the tip labels with the image files
mysticetes$tip.label <- myst_tips$common[match(mysticetes$tip.label, myst_tips$scientific)]
mysticetes$file <- paste0("imgs/", myst_tips$img[match(mysticetes$tip.label, myst_tips$common)], ".svg")
odonticetes$tip.label <- odon_tips$common[match(odonticetes$tip.label, odon_tips$scientific)]
odonticetes$file <- paste0("imgs/", odon_tips$img[match(odonticetes$tip.label, odon_tips$common)], ".svg")
#generate colors for each species
colors <- hues::iwanthue(nrow(myst_tips)+nrow(odon_tips), hmin = 0, hmax = 360, cmin = 30, cmax = 80, lmin = 35, lmax = 80)
#set.seed(12345)
#set.seed(123)
set.seed(1234)
colors <- colors[sample(1:(nrow(myst_tips)+nrow(odon_tips)))]
#construct un-annotated phylogeny plot for mysticetes
myst_phylo_plot <- ggtree(mysticetes, branch.length = "none", layout = "roundrect")
myst_phylo_plot$data$file <- c(mysticetes$file, rep(NA, nrow(myst_phylo_plot$data) - nrow(myst_tips)))
myst_phylo_plot <- myst_phylo_plot +
geom_tiplab(aes(image = file, color = label), geom = "image", offset = 9, size = 0.06, align = TRUE) + xlim(NA, 13.5) +
geom_tiplab(aes(color = label), geom = "label", family = "Avenir", parse = FALSE, align = TRUE, size = 2.5) +
scale_color_manual(values = colors[1:nrow(myst_tips)]) + theme(legend.position = "none") + ylim(0.5, nrow(myst_tips))
#construct un-annotated phylogeny plot for odontocetes
odon_phylo_plot <- ggtree(odonticetes, branch.length = "none", layout = "roundrect")
odon_phylo_plot$data$file <- c(odonticetes$file, rep(NA, nrow(odon_phylo_plot$data) - nrow(odon_tips)))
odon_phylo_plot <- odon_phylo_plot +
geom_tiplab(aes(image = file, color = label), geom = "image", offset = 21, size = 0.05, align = TRUE) + xlim(NA, 28) +
geom_tiplab(aes(color = label), geom = "label", family = "Avenir", parse = FALSE, align = TRUE, size = 2.5) +
scale_color_manual(values = colors[(nrow(myst_tips)+1):(nrow(myst_tips)+nrow(odon_tips))]) + theme(legend.position = "none") + ylim(0.5, nrow(odon_tips))
#create plot labels for mysticetes
myst_labels <- c(label_maker(bowhead_data, intervals = TRUE),
label_maker(right_data),
label_maker(minke_data, intervals = TRUE),
label_maker(sei_data),
label_maker(blue_data),
label_maker(fin_data, intervals = TRUE),
label_maker(humpback_data))
#create plot labels for odontocetes
odon_labels <- c(label_maker(sperm_data, intervals = TRUE),
label_maker(narrow_data, intervals = TRUE),
label_maker(killer_data),
label_maker(peales_data, intervals = TRUE),
label_maker(bottlenose_data, intervals = TRUE),
label_maker(rissos_data, intervals = TRUE),
label_maker(heavisides_data, intervals = TRUE),
label_maker(commersons_data, intervals = TRUE),
label_maker(hectors_data, intervals = TRUE))
#add annotations to mysticetes
myst_phylo_plot <- myst_phylo_plot + annotate("text", label = myst_labels, x = rep(max(myst_phylo_plot$data$x), length(myst_labels)), y = (1:length(myst_labels))-0.35, hjust = 0, family = "Avenir", size = 2.2, lineheight = 0.8)
myst_phylo_plot <- myst_phylo_plot + annotate("text", label = "Baleen Whales (Mysticetes)", x = min(myst_phylo_plot$data$x)-0.65, y = myst_phylo_plot$data$y[which.min(myst_phylo_plot$data$x)], angle = 90, family = "Avenir", size = 3)
#add annotations to odontocetes
odon_phylo_plot <- odon_phylo_plot + annotate("text", label = odon_labels, x = rep(max(odon_phylo_plot$data$x), length(odon_labels)), y = (1:length(odon_labels))-0.35, hjust = 0, family = "Avenir", size = 2.2, lineheight = 0.8)
odon_phylo_plot <- odon_phylo_plot + annotate("text", label = "Toothed Whales (Odontocetes)", x = min(odon_phylo_plot$data$x)-1.4, y = odon_phylo_plot$data$y[which.min(odon_phylo_plot$data$x)], angle = 90, family = "Avenir", size = 3)
#match up the colors between phylogeny and menzerath's law plots based on the labels
color_matching <- data.frame(species = c(myst_phylo_plot$data$label[-which(is.na(myst_phylo_plot$data$label))], odon_phylo_plot$data$label[-which(is.na(odon_phylo_plot$data$label))]), color_code = colors[c(as.numeric(factor(mysticetes$tip.label)), as.numeric(factor(odonticetes$tip.label))+nrow(myst_tips))])
#create menzerath's law plots for each species
humpback_plot <- menz_plot(data = humpback_data, model = humpback_models, color = color_matching$color_code[grep("Humpback ", color_matching$species)])
fin_plot <- menz_plot(data = fin_data, model = fin_models, intervals = TRUE, color = color_matching$color_code[grep("Fin ", color_matching$species)])
blue_plot <- menz_plot(data = blue_data, model = blue_models, color = color_matching$color_code[grep("Blue ", color_matching$species)])
minke_plot <- menz_plot(data = minke_data, model = minke_models, intervals = TRUE, color = color_matching$color_code[grep("Minke ", color_matching$species)])
killer_plot <- menz_plot(data = killer_data, model = killer_models, color = color_matching$color_code[grep("Killer ", color_matching$species)])
sperm_plot <- menz_plot(data = sperm_data, model = sperm_models, intervals = TRUE, color = color_matching$color_code[grep("Sperm ", color_matching$species)])
bowhead_plot <- menz_plot(data = bowhead_data, model = bowhead_models, intervals = TRUE, color = color_matching$color_code[grep("Bowhead ", color_matching$species)])
right_plot <- menz_plot(data = right_data, model = right_models, color = color_matching$color_code[grep("Right ", color_matching$species)])
narrow_plot <- menz_plot(data = narrow_data, model = narrow_models, intervals = TRUE, color = color_matching$color_code[grep(" Porpoise", color_matching$species)])
rissos_plot <- menz_plot(data = rissos_data, model = rissos_models, intervals = TRUE, color = color_matching$color_code[grep("Risso's ", color_matching$species)])
bottlenose_plot <- menz_plot(data = bottlenose_data, model = bottlenose_models, intervals = TRUE, color = color_matching$color_code[grep("Bottlenose ", color_matching$species)])
heavisides_plot <- menz_plot(data = heavisides_data, model = heavisides_models, intervals = TRUE, color = color_matching$color_code[grep("Heaviside's ", color_matching$species)])
commersons_plot <- menz_plot(data = commersons_data, model = commersons_models, intervals = TRUE, color = color_matching$color_code[grep("Commerson's ", color_matching$species)])
peales_plot <- menz_plot(data = peales_data, model = peales_models, intervals = TRUE, color = color_matching$color_code[grep("Peale's ", color_matching$species)])
hectors_plot <- menz_plot(data = hectors_data, model = hectors_models, intervals = TRUE, color = color_matching$color_code[grep("Hector's ", color_matching$species)])
sei_plot <- menz_plot(data = sei_data, model = sei_models, color = color_matching$color_code[grep("Sei ", color_matching$species)])
#create and save full phylogeny plot for mysticetes
png("plots/myst_phylo.png", width = 6, height = nrow(myst_tips), units = "in", res = 600)
#cairo_pdf("plots/myst_phylo.pdf", width = 6, height = 10, family = "avenir")
right_panel <- cowplot::plot_grid(humpback_plot, fin_plot, blue_plot, sei_plot, minke_plot, right_plot, bowhead_plot, NULL,
ncol = 1, rel_heights = c(rep(1, nrow(myst_tips)), 0.35))
bottom_row <- cowplot::plot_grid(myst_phylo_plot, right_panel, rel_widths = c(1, 1))
bottom_row
dev.off()
#create and save full phylogeny plot for odontocetes
png("plots/odon_phylo.png", width = 6, height = nrow(odon_tips), units = "in", res = 600)
#cairo_pdf("plots/odon_phylo.pdf", width = 6, height = 10, family = "avenir")
right_panel <- cowplot::plot_grid(hectors_plot, commersons_plot, heavisides_plot, rissos_plot, bottlenose_plot, peales_plot, killer_plot, narrow_plot, sperm_plot, NULL,
ncol = 1, rel_heights = c(rep(1, nrow(odon_tips)), 0.35))
bottom_row <- cowplot::plot_grid(odon_phylo_plot, right_panel, rel_widths = c(1, 1))
bottom_row
dev.off()
Figure 1: The baleen whale (Mysticete) species included in the study (left), alongside the distribution of element/interval durations and sequence lengths (middle) and the slope of Menzerath’s law (right). Each point in the distribution plots (middle) marks the mean duration of elements/intervals, but the slopes on the right were computed from the full set of elements/intervals. The bars in the slope plots (right) mark the 95% confidence intervals around the point estimates.
Interestingly, the North Pacific right whales have four distinct clusters of sequences in Figure 1, which directly correspond to the four song types identified by (10). The strong positive relationship between sequence length and element duration appears to be driven by the distribution of these clusters. Menzerath’s law makes no predictions about different categories of sequences, but it is worth noting that when Equation (5) is computed separately on each song type the results vary (GS1-PF estimate: -0.11, 95% CI: [-0.17, -0.05]; GS4-DG estimate: 0.01, 95% CI: [-0.03, 0.04]; GS3-PU estimate: -0.03, 95% CI: [-0.05, 0]; GS2-TP estimate: 0.06, 95% CI: [0.04, 0.08]).
For humpback and killer whales, I also assessed Menzerath’s law using data from a higher level of analysis. In humpback whales, I found that the length of songs negatively predicted the duration of phrases (estimate = -0.25, 95% CI: [0.065, 0.377]), similar to the pattern for notes within phrases. Interestingly, in killer whales I found that the length of call sequences negatively predicted the duration of calls (estimate = -0.043, 95% CI: [-0.082, -0.004]), even though the situation is reversed for elements within calls.
Figure 2: The toothed whale (Odontocete) species included in the study (left), alongside the distribution of element/interval durations and sequence lengths (middle) and the slope of Menzerath’s law (right). Each point in the distribution plots (middle) marks the mean duration of elements/intervals, but the slopes on the right were computed from the full set of elements/intervals. The bars in the slope plots (right) mark the 95% confidence intervals around the point estimates.
Figure 3 shows a direct comparison between the strength of Menzerath’s law in the whale data and the spoken human language data (i.e., phonemes within words) from the DoReCo corpus (49), alongside the influence of the position of elements or intervals on their duration. The same results for words within sentences can be seen in the Supplementary Information. The 11 whale species that adhere to Menzerath’s law express it to at least a similar extent as the human languages, and sometimes to a much greater extent (e.g., humpback whales). The effect of the position of elements and intervals on their duration is much more variable. Human languages tend to have positive relationship between position and duration, which means that elements are lengthened as sequences progress. Whales, on the other hand, appear to shorten elements over the course of sequences (see Table 2), but this varies dramatically across species.
Interestingly, there are several exceptions to Menzerath’s law in the human language data. Arapaho exhibits a positive effect of word length on phoneme duration (Figure 3), and Tabasaran, Sanzhi Dargwa, Pnar, English (recorded in southern England), Yongning Na, and Cabécar show no effect of sentence length on word duration (Supplementary Information). These exceptions come from a wide variety of language families (e.g., Algic, Nakh-Daghestanian, Austroasiatic, Indo-European, Sino-Tibetan, Chibchan) from North America, Europe, and Asia.
#get labels for plotting
human_dataset_labels <- list.files("data/doreco/")
human_dataset_labels <- substr(human_dataset_labels[grep("doreco", human_dataset_labels)], 1, 15)
human_dataset_labels <- as.character(sapply(human_dataset_labels, function(x){gsub(" DoReCo dataset.*", "", gsub("^The ", "", readLines(paste0("data/doreco/", x, "_extended/", x, "_dataset-info.txt")[1])[1]))}))
#extract effects for phonemes in words
phonemes_in_words_effects <- lapply(phonemes_in_words, extract_freq_effects)
#reformat words data in a format that is plottable
phonemes_in_words_plot_data <- data.frame(cbind(do.call(rbind, lapply(phonemes_in_words_effects, function(x){x$length})), do.call(rbind, lapply(phonemes_in_words_effects, function(x){x$position}))))
colnames(phonemes_in_words_plot_data) <- c("length_lower", "length_median", "length_upper", "position_lower", "position_median", "position_upper")
phonemes_in_words_plot_data$label <- human_dataset_labels
phonemes_in_words_plot_data <- phonemes_in_words_plot_data[order(phonemes_in_words_plot_data$length_median, decreasing = TRUE), ]
phonemes_in_words_plot_data$label[which(phonemes_in_words_plot_data$label == "Nǁng")] <- "Nllng" #special characters
#compute effects for whale data
whales_effects <- lapply(list(sperm_models, humpback_models, fin_models, killer_models, blue_models, minke_models, bowhead_models, right_models, narrow_models, heavisides_models, commersons_models, peales_models, hectors_models, rissos_models, bottlenose_models, sei_models), extract_freq_effects)
#reformat whale data in a format that is plottable
whales_plot_data <- data.frame(cbind(do.call(rbind, lapply(whales_effects, function(x){x$length})), do.call(rbind, lapply(whales_effects, function(x){x$position}))))
colnames(whales_plot_data) <- c("length_lower", "length_median", "length_upper", "position_lower", "position_median", "position_upper")
whales_plot_data$label <- c("Sperm Whale", "Humpback Whale", "Fin Whale", "Killer Whale", "Blue Whale", "Common Minke Whale", "Bowhead Whale", "North Pacific Right Whale", "Narrow-Ridged Finless Porpoise", "Heaviside's Dolphin", "Commerson's Dolphin", "Peale's Dolphin", "Hector's Dolphin", "Risso's Dolphin", "Bottlenose Dolphin", "Sei Whale")
whales_plot_data <- whales_plot_data[order(whales_plot_data$length_median, decreasing = TRUE), ]
#reorder everything for a single axis
whales_plot_data$x <- 1:nrow(whales_plot_data)
whales_plot_data$group <- 1
phonemes_in_words_plot_data$x <- (max(whales_plot_data$x)+1):(max(whales_plot_data$x)+nrow(phonemes_in_words_plot_data))
phonemes_in_words_plot_data$group <- 2
#combine whale data with word data
combined_words_plot_data <- rbind(phonemes_in_words_plot_data, whales_plot_data)
#generate plot of phonemes in words against whales, for length
combined_words_length_plot <- ggplot(combined_words_plot_data) +
geom_linerange(aes(x = x, ymin = length_lower, ymax = length_upper, color = factor(group))) +
geom_hline(aes(yintercept = 0), lty = "dashed") +
geom_vline(aes(xintercept = nrow(whales_plot_data) + 0.5), lty = "dotted") +
scale_y_continuous(limits = c(min(combined_words_plot_data$length_lower)*1.05, max(combined_words_plot_data$length_upper)*1.05),
#name = expression("95% CI for "~italic("b")~"(Strength of Menzerath's Law)")) +
name = "Effect of Length on Duration") +
scale_x_continuous(breaks = combined_words_plot_data$x, labels = combined_words_plot_data$label, name = NULL, limits = c(0, nrow(combined_words_plot_data) + 1), expand = c(0, 0)) +
scale_color_manual(values = c("#638ccc", "#ca5670"), labels = c("Whales", "Humans"), name = "Taxa") +
theme_linedraw(base_size = 8, base_family = "Avenir") + theme(axis.text.x = element_text(angle = 90, hjust = 0.99, vjust = 0.5), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank())
#generate plot of phonemes in words against whales, for position
combined_words_position_plot <- ggplot(combined_words_plot_data) +
geom_linerange(aes(x = x, ymin = position_lower, ymax = position_upper, color = factor(group))) +
geom_hline(aes(yintercept = 0), lty = "dashed") +
geom_vline(aes(xintercept = nrow(whales_plot_data) + 0.5), lty = "dotted") +
scale_y_continuous(limits = c(min(combined_words_plot_data$position_lower)*1.05, max(combined_words_plot_data$position_upper)*1.05),
#name = expression("95% CI for "~italic("b")~"(Strength of Menzerath's Law)")) +
name = "Effect of Position on Duration") +
scale_x_continuous(breaks = combined_words_plot_data$x, labels = combined_words_plot_data$label, name = NULL, limits = c(0, nrow(combined_words_plot_data) + 1), expand = c(0, 0)) +
scale_color_manual(values = c("#638ccc", "#ca5670"), labels = c("Whales", "Humans"), name = "Taxa") +
theme_linedraw(base_size = 8, base_family = "Avenir") + theme(axis.text.x = element_text(angle = 90, hjust = 0.99, vjust = 0.5), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank())
#export plot of phonemes in words
png("plots/word_level_effects.png", width = 8, height = 6, units = "in", res = 600)
cowplot::plot_grid(cowplot::plot_grid(combined_words_length_plot + theme(axis.text.x = element_blank(), legend.position = "none"), combined_words_position_plot + theme(legend.position = "none"), ncol = 1, align = "v", rel_heights = c(0.66, 1)), cowplot::get_legend(combined_words_length_plot), nrow = 1, rel_widths = c(1, 0.12))
dev.off()
# #export plot of phonemes in words
# png("plots/word_level_effects.png", width = 8, height = 4, units = "in", res = 600)
# #cairo_pdf("plots/effects.pdf", width = 8, height = 4, family = "avenir")
# combined_words_plot
# dev.off()
Figure 3: The 95% confidence intervals for the effect of sequence length (top; computed from Equation (5)) and position (bottom; computed from Equation (6)) on element/interval duration for the 16 whale species and 51 human languages. The human language data are comprised of phonemes within words.
Of the two models used to assess cross-species trends, the one that included both length and position best fit the data (Equation (8); \(\Delta AIC\) = 1846). The results of this model can be seen in Table 2. Overall, there is a strong negative effect of sequence length on the duration of elements and intervals, which is consistent with Menzerath’s law. The interaction between this effect and data type is positive, suggesting that Menzerath’s law is slightly weaker when data are comprised of intervals rather than elements. Additionally, there is a negative effect of position on element duration, indicating that elements tend to shorten as sequences progress. The interactions between position, group, and type suggest two things: toothed whales (Odontocetes) shorten later elements and intervals to a greater extent, and elements tend to get shortened more than intervals over the course of sequences. Importantly, these interactions are strong enough to neutralize the effect of position in some conditions. For example, the overall effect of position on duration in a baleen whale species (Mysticete, group = 0) with interval data (type = 1) would be only -0.005 (95% CI: [-0.018, 0.008]).
Predictor | Effect | 2.5% | 97.5% |
|
|---|---|---|---|---|
Length | -0.342 | -0.364 | -0.319 | * |
: Group | -0.003 | -0.027 | 0.021 | |
: Type | 0.089 | 0.058 | 0.121 | * |
Position | -0.067 | -0.073 | -0.061 | * |
: Group | -0.036 | -0.040 | -0.032 | * |
: Type | 0.062 | 0.055 | 0.069 | * |
The vocalizations of 11 of the 16 whale species included in this analysis adhere to Menzerath’s law, suggesting that they have undergone compression for increased communicative efficiency. Among these 11 species, the strength of Menzerath’s law is comparable to, and sometimes far greater than, what is observed in spoken human language data. In the main text, I compared the whale sequences to phonemes within words because I was working with the smallest reported units for each species, but the results are similar for words within sentences (see Supplementary Information). For two species, humpback whales and killer whales, I was able to analyze sequences at two levels of analysis. Humpback whales exhibit Menzerath’s law for both notes within phrases and phrases within songs. Killer whales, on the other hand, only exhibit Menzerath’s law at the level of call sequences, as opposed to the elements comprising calls. When data from all 16 whale species are included in a single analysis, there is strong evidence for both Menzerath’s law and for an effect of position—elements and intervals tend to be shortened over the course of sequences.
Several species’ produce vocalizations that do not adhere to Menzerath’s law—killer whales (at the level of elements within calls), North Pacific right whales, and the three Cephalorhynchus dolphin species. The fact that killer whale vocalizations exhibit Menzerath’s law in their call sequences, but not elements within calls, suggests that the former may be the more relevant level of analysis for communication (62). The results from the North Pacific right whales are more puzzling. The data used in this study are from the first documented recordings of song in any right whale species (10), and are comprised of four song types with fairly dramatic differences in sequence lengths and interval durations (see clusters in Figure 1). When Menzerath’s law is assessed separately on each song type, two display the expected negative relationship, one displays a neutral relationship, and one displays a positive relationship between sequence length and interval duration. One speculative explanation for the mixed results in North Pacific right whales is that the songs may be in an early stage of cultural evolution. (10) found only one clear case of difference animals producing the same song type, and linguistic laws may emerge from repeated cultural transmission between individuals (63). The three Cephalorhynchus species in this study—Hector’s dolphins, Commerson’s dolphins, and Heaviside’s dolphins—all produce both narrowband high-frequency and broadband clicks that are thought to function in both echolocation and communication (13–15). I only analyzed the rapid burst pulses of these clicks that are associated with social behavior, but it is possible that the clicks’ use in echolocation reduces pressure for communicative efficiency. However, the burst pulses of the four other dolphin and porpoise species included in this study do adhere to Menzerath’s law, so this hypothesis only makes sense if the boundaries between echolocation and communication are fuzzier in Cephalorhynchus.
On a related note, Menzerath’s law does not appear to be universal in spoken language at the level of phonemes in words (Figure 3) or words within sentences (Supplementary Information), which is consistent with previous work on clauses in written sentences (30,64) and syllables in written words (50,65). Menzerath’s law in language, then, appears to be a statistical tendency rather than an absolute universal (66), as opposed to Zipf’s rank-frequency and brevity laws (67,68).
The shortening of elements and intervals later in sequences is an unexpected finding, as the opposite pattern is often (but not always) observed in birdsong (43,51) and human language (50) (see Figure 3). In fact, “final lengthening” is a well-studied linguistic phenomenon in which vowels are lengthened right before word, phrase, and sentence boundaries (69–71). One account for final lengthening is that it initially evolved to minimize the cost of switching from exhaling to inhaling between elements (72), and has subsequently been elaborated via cultural evolution to make the boundaries between elements easier to perceive (73). Both toothed and baleen whales have specialized adaptations that allow them to vocalize while holding their breath (74,75), which may release them from the specific motor constraints that drive final lengthening (76).
Another explanation comes from primates, where coppery titi monkeys, eastern grey gibbons, and gelada baboons shorten some aspects of their vocalizations over the course of sequences (elements for the first two, intervals for the third) (35,52). Longer vocalizations are more energetically costly (77–79), which is probably why humans and other mammals shorten their vocalizations as they fatigue (80–82). (35) and (52) hypothesize that vocal shortening later in sequences reflects this simple energetic constraint, and that it may even explain Menzerath’s law in some species. Other work in humans and birds supports the idea that Menzerath’s law has physical origins (43,55,56)—a development that some have described as “liberating” after decades of debate about the origins of linguistic laws (83). In humans, Menzerath’s law appears to be stronger in spoken than in written language (55,56), and deafened canaries and zebra finches produce songs consistent with the law without hearing adult birds (43). If (35) and (52) are correct, then the presence of vocal shortening may point to a physical original for Menzerath’s law in whale communication.
I would like to thank all first authors who contributed data to this study, either directly (via personal correspondence) or indirectly (by publishing open data): Leah Lewis, Florence Erbs, Miriam Romagosa, Megan Wood, Paul Best, Elena Schall, Clare Owen, Cameron Martin, Jessica Crance, Gabrielle Macklin, Arthur Stepanov, Morgan Martin, Nicoline Nielsen, Anna Selbmann, Deborah Sharpe, Tomoyoshi Terada, Patricia Arranz, Taylor Hersh, Felicia Vachon, and Shane Gero.
The analysis code, and all datasets that were made open access by the original authors, can be found on GitHub (https://github.com/masonyoungblood/whale_efficiency) and in the HTML form of the manuscript (https://masonyoungblood.github.io/whale_efficiency/). For access to the other datasets that are not publicly available, please reach out to the original authors (see Table 1).